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BAYESIAN INFERENCE AND THE PARAMETRIC BOOTSTRAP 



By Bradley Efron 1 

Stanford University 

The parametric bootstrap can be used for the efficient computa- 
tion of Bayes posterior distributions, fmportance sampling formulas 
take on an easy form relating to the deviance in exponential fami- 
lies and are particularly simple starting from Jeffreys invariant prior. 
Because of the i.i.d. nature of bootstrap sampling, familiar formulas 
describe the computational accuracy of the Bayes estimates. Besides 
computational methods, the theory provides a connection between 
Bayesian and frequentist analysis. Efficient algorithms for the fre- 
quentist accuracy of Bayesian inferences are developed and demon- 
strated in a model selection example. 

1. Introduction. This article concerns the use of the parametric boot- 
strap to carry out Bayesian inference calculations. Two main points are 
made: that in the comparatively limited set of cases where bootstrap meth- 
ods apply, they offer an efficient and computationally straightforward way 
to compute posterior distributions and estimates, enjoying some advantages 
over Markov chain techniques; and, more importantly, that the parametric 
bootstrap helps connect Bayes and frequentist points of view. 

The basic idea is simple and not unfamiliar: that the bootstrap is use- 
ful for importance sampling computation of Bayes posterior distributions. 
An important paper by Newton and Raftery (1994) suggested a version of 
nonparametric bootstrapping for this purpose. By "going parametric" we 
can make the Bayes/bootstrap relationship more transparent. This line of 
thought has the advantage of linking rather than separating frequentist and 
Bayesian practices. 

Section 2 introduces the main ideas in terms of an elementary one-parame- 
ter example and illustrates a connection between Jeffreys invariant prior den- 
sity and second-order accurate bootstrap confidence limits. Both methods 
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are carried out via reweighting of the original "raw" bootstrap replications. 
The calculation of posterior distributions by bootstrap reweighting is a main 
theme here, in constrast to Markov chain methods, which strive to directly 
produce correctly distributed posterior realizations. 

Multidimensional exponential families, discussed in Section 3, allow the 
Bayes/bootstrap conversion process to be explicitly characterized. Two im- 
portant families, multivariate normal and generalized linear models, are in- 
vestigated in Sections 4 and 5. Jeffreys- type priors can yield unsatisfactory 
results in multiparameter problems [Ghosh (2011)], as shown here by com- 
parison with bootstrap confidence limits. 

An advantage of bootstrap reweighting schemes is the straightforward 
analysis of their accuracy. Section 6 develops accuracy estimates for our 
methodology, both internal (How many bootstrap replications are neces- 
sary?) and external (How much would the results vary in future data sets?). 
The latter concerns the frequentist analysis of Bayesian estimates, an impor- 
tant question in "objective Bayes" applications; see, for instance, Gelman, 
Meng and Stern (1996) and Berger (2006). 

Bootstrap reweighting can apply to any choice of prior (not favoring con- 
venience priors such as the conjugates, e.g.), but here we will be most inter- 
ested in the objective-type Bayes analyses that dominate current practice. 
Jeffreys priors are featured in the examples, more for easy presentation than 
necessity. The paper ends with a brief summary in Section 7. Some technical 
details are deferred to the Appendix. 

Connections between nonparametric bootstrapping and Bayesian infer- 
ence emerged early, with the "Bayesian bootstrap," Rubin (1981) and Efron 
(1982). Bootstrap reweighting is deployed differently in Smith and Gelfand 
(1992), with a nice example given in their Section 5. Sections 4 and 6 of 
Efron and Tibshirani (1998) develop bootstrap reweighting along the lines 
used in this paper. 

2. Conversion and reweighting. Our methodology is introduced here in 
terms of a simple one-parameter problem. Table 1 shows scores for n = 22 
students on two tests, "mechanics" and "vectors," having sample correlation 

(2.1) (9 = 0.498. 

Table 1 

Scores of 22 students on two tests, "mechanics" and "vectors" [from Mardia, Kent and 
Bibby (1979), a randomly chosen subset of the 88 students in their Table 1,2.1]. 
The sample correlation is 6 = 0.498 
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We wish to calculate some measure of posterior distribution for the true 
underlying parameter value 

(2.2) 9q = correlation (mechanics score, vectors score). 

As in Mardia, Kent and Bibby (1979), we assume that the individual 
student scores y^ = (mecj, vecj) are a random sample from a bivariate normal 
distribution having unknown mean vector [i and covariance matrix S, 

(2.3) y:w^M(M,S) for i = 1,2,..., 22 

with y = (yi,y2, ■ ■ ■ , 2/22) representing the full data set. Let (jl, S) denote the 
usual maximum likelihood estimate (MLE). Then a parametric bootstrap 
sample y* follows (2.3), with (/t,E) replacing (//,£), 

(2.4) y*:y*^ d Ar 2 (fL,t) for i = 1, 2, . . . , 22. 

The sample correlation of y* is a parametric bootstrap replication of 9, 
say, 9*. A total of B = 10,000 parametric bootstrap samples y* were in- 
dependently generated according to (2.4), and the corresponding 9* values 
calculated. We will denote them simply as 

(2.5) 6*i, 9 2 , . ■ ■ ,9i, . . . ,9b 

with 9i short for 9*. 

The histogram in Figure 1 compares the distribution of the 10,000 9i's 
with Fisher's theoretical density function fe(9), 

§ (n - 2)(1 - _ g2 )( n-4)/2 roo dw 



(cosh w 

where 9 has been set equal to its MLE value 0.498. In this sense /o.49s( - ) 
is the ideal parametric bootstrap density we would obtain if the number of 
replications B approached infinity. Chapter 32 of Johnson and Kotz (1970) 
gives formula (2.6) and other representations of fg(9). 
Figure 1 also indicates the exact 95% confidence limits 

(2.7) #0 £ (0.093, 0.741), 

2i% noncoverage in each tail, obtained from fo(9) by the usual construction, 

(2.8) f fo.093(0)d9 = 0.025 

J0.498 

and similarly at the upper endpoint. 

Suppose now 2 we have a prior density n(9) for the parameter 9 and wish 
to calculate the posterior density ir(0\9). For any subset A of the parameter 



2 For this example we reduce the problem to finding the posterior distribution of 
given 9, ignoring any information about 9 in the part of (/}, S) orthogonal to 9. Our 
subsequent examples do not make such reductions. 
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Fig. 1. Histogram of B = 10,000 bootstrap replications for the student score correlation 
coefficient (2.4)-(2.5) scaled to integrate to 1. Solid curve is Fisher's density formula (2.6) 
for 9 = 0.498. Triangles indicate the exact 95% confidence interval 9 £ (0.093,0.741). 



space G = [—1,1], 

(2.9) Pr{0 e A\9} = J n(0)fg(§) dO / J n(9)f e (9) dO 

according to Bayes rule. 

Define the conversion factor R{9) to be the ratio of the likelihood function 
to the bootstrap density, 

(2.10) R{6) = fo(0)/f § (0). 

Here 9 is fixed at its observed value 0.498 while 9 represents any point in 0. 
We can rewrite (2.9) as 

( , u) ^m J r { ZXl - 

J e ir(9)R(9)f § {9) d9 
More generally, if t(9) is any function 9, its posterior expectation is 



(2.12) E{t(9)\9} 



J @ t(9M9)R(9)f § (9)d9 
J e 7T(9)R(9)f § (9)d9 ■ 



The integrals in (2.11) and (2.12) are now being taken with respect to 
the parametric bootstrap density /«(•)• Since 9\, 92, ■ ■ ■ , 9b (2.5) is a random 
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Fig. 2. Heavy curve is the posterior density ir(0\8) for the correlation (2.2), starting from 
Jeffreys prior (2.14), obtained by reweighting the B = 10,000 bootstrap replications (2.5); 
triangles show 95% credible limits 6q G (0.095,0.748). Light dashed curve is raw unweighted 
bootstrap distribution. Beaded curve is BCa weighted bootstrap density (2.17), nearly the 
same as tt(0\6) in this case. 



sample from /§(■), the integrals can be estimated by sample averages in the 
usual way, yielding the familiar importance sampling estimate of E{t(9)\9} , 

B B 

(2.13) E{t(e)\Q} = J2tiKiRi/Y, 7riRi > 

i=l i=l 

where ti = t(9i) ,7Ti = ir(9i) , and Ri = R(6i). Under mild regularity condi- 
tions, the law of large numbers implies that E{t(9)\9} approaches E{t(9)\9} 
as B — > oo. (The accuracy calculations of Section 6 will show that in this 
case B = 10,000 was larger than necessary for most purposes.) 

The heavy curve in Figure 2 describes n(0\8), the estimated posterior 
density starting from Jeffreys prior 

(2.14) tt(0) = 1/(1 - 6 2 ) 

(see Section 3). The raw bootstrap distribution puts weight 1/B on each 
of the B replications By reweighting these points proportionately to 
Wi = TTiRi, we obtain the estimated posterior distribution of 6 given 9, with 

B 

(2.15) Pr{9eA\§}= Y,™i/Y, Wi '> 
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tt(6\0) represents the density of this distribution — essentially a smoothed 
histogram of the 10,000 #j's, weighted proportionally to Wi. 

Integrating %{6\9) yields the 95% credible limits (2^% posterior probabil- 
ity in each tail) 

(2.16) #o £ (0.095, 0.748), 

close to the exact limits (2.7). Prior (2.14) is known to yield accurate fre- 
quentist coverage probabilities, being a member of the Welch-Peers family 
discussed in Section 4. 

In this case, the weights Wi = itiRi can be thought of as correcting the 
raw unweighted (wi = 1) bootstrap density. Figure 2 shows the correction 
as a small shift leftward. BCa, standing for bias- corrected and accelerated, 
is another set of corrective weights, obtained from purely frequentist con- 
siderations. Letting G{9) denote the usual empirical cumulative distribution 
function (c.d.f.) of the bootstrap replications 61,82, ■ ■ ■ ,9b, the BCa weight 
on 0i is 

(2.17) w i -7— ^—7 : r [zei-® ^(Pi) ~ ^0 , 

(1 + az0iY(p{zei + z ) 

where (p and ® are the standard normal density and c.d.f., while zq and a are 
the bias- correction and acceleration constants developed in Efron (1987) and 
DiCiccio and Efron (1992), further discussed in Section 4 and the Appendix. 
Their estimated values are zq = —0.068 and a = for the student score 
correlation problem. 

The BCa density 7r BCa (0|0), obtained by reweighting as in (2.15), is seen 
in Figure 2 to nicely agree with the Jeffreys posterior density, being slightly 
heavier in the left tail, with 95% central interval #0 £ (0.074,0.748). This 
agreement is misleadingly soothing, as will be seen in the multidimensional 
context of Section 4. 



3. Exponential families. The Bayes/bootstrap conversion process takes 
on a simplified form in exponential families. This facilitates its application 
to multiparameter problems, as discussed here and in the next two sections. 

The density functions for a p-parameter exponential family T can be 
expressed as 

(3.1) / /3 (/3) = e Q '^)/o(/3), 

where the p-vector a is the canonical parameter, ft is the p-dimensional 
sufficient statistic vector, and where ip(a), the cumulant generating function, 
provides the multipliers necessary for fp(ft) integrating to 1. Here we have 
indexed the family by its expectation parameter vector ft, 

(3.2) ft = E a {ft} 
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for the sake of subsequent notation, but a and j3 are one-to-one functions 
and we could just as well write f a ($). 

The deviance between any two members of J- is 

(3-3) D(J3 1 ,p 2 )=2E fh {]o e (ff h 0)/ff h 0))} 

[denoted equivalently D{oti,ot2) since deviance does not depend on the pa- 
rameterization of JF\. Taking logs in (3.1) shows that 

(3.4) £>(&, #j)/2 = (at - a 2 )'Pi - W>(<*i) - ^M). 

Then family (3.1) can be re-expressed in "Hoeffding's form" as 

(3-5) fm = fpW)e- D ^ )/2 . 

Since D((3,f3) is equal to or greater than zero, (3.5) shows that f3 = $ is 
the MLE, maximizing fa($) over all choices of /3 in B, the space of possible 
expectation vectors. 

Parametric bootstrap replications of (3 are independent draws from /^(-)i 

(3.6) /£(•)— A, 

where /3j is shorthand notation for f3* . Starting from a prior density 7r(/3) on 
B, the posterior expectation of any function t(/3) given (3 is estimated by 

B B 

(3.7) E{t{p)\p] =Y,mxPiWi) w) 

i=l i=l 

as in (2.13), with R(f3) the conversion factor 
(3-8) R(P) = f p 0)/f $ (l3). 

Note: 7r(/3)i?(/3) is transformation invariant, so formula (3.7) produces the 
same numerical result if we bootstrap a±, ■ ■ ■ ,ctB instead of (3.6), or for 
that matter bootstrap any other sufficient vector. See Section 4. 
Hoeffding's form (3.5) allows a convenient expression for i?(/3): 

Lemma 1. Conversion factor (3.8) equals 

(3.9) R((3)=me AW , 
where 

(3-10) t(P) = f00)/fM 
and 

(3.H) A(P) = [D(pJ)-D0,P)]/2. 
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Letting d be the canonical parameter vector corresponding to f3, (3.4) 
gives 

(3.12) A(/3) = (a - a)'(P + $) - 2[ifj(a) - ^(a)], 

which is useful for both theoretical and numerical computations. 

The derivatives of ip with respect to components of a yield the moments 
of/3, 

(3.13) Tp(a) = (dip/daj) = P, ${a) = (d 2 ip/dajda k ) = V{a) = cov a 0} 
and 

(3.14) #(a) = (d 3 ^/d aj da k da t ) = U(a), 

Ujki(oi) = E a (j3j — /3j)(j3k ~ Pk){$i ~ A)- I n repeated sampling situations, 
where /3 is obtained from n independent observations, the entries of V(a) and 
U{a) are typically of order 0{n~ l ) and (9(n~ 2 ), respectively; see Section 5 
of Efron (1987). 

The normal approximation 

(3.15) $ ~ M p (P,V(a)) 
yields 

(3.16) fp{P) = {2^l 2 \V{a)r l/2 and f $ 0) = (2vr)^/ 2 |F(d)r 1/2 , 
so 

(3.17) e(/3) = in«)i 1/2 /in«)i 1/2 - 

Because (3.16) applies the central limit theorem where it is most accurate, 
at the center, (3.17) typically errs by a factor of only 1 + 0(l/n) in repeated 
sampling situations; see Tierney and Kadane (1986). In fact, for discrete 
families like the Poisson, where fp{fi) is discontinuous, approximation (3.17) 
yields superior performance in applications of (3.9) to (3.7). In what follows 
we will treat (3.17) as exact rather than approximate. 

Jeffreys invariant prior density, as described in Kass and Wasserman 
(1996), takes the form 

(3.18) ir JeS (p) = c\V(a)\~ 1/2 

in family (3.1), with c an arbitrary positive constant that does not affect 
estimates such as (3.7). Ignoring c, we can use (3.17) and (3.18) to rewrite 
the conversion factor R((3) (3.9) as 

(3.19) i?(/?) = e A(/3) /7r Jeff (/3)- 

Jeffreys prior is intended to be "uninformative." Like other objective pri- 
ors discussed in Kass and Wasserman, it is designed for Bayesian use in 
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situations lacking prior experience. Its use amounts to choosing 

(3.20) tt(j9)-R(/3) = e A ^ 

in which case (3.7) takes on a particularly simple form: 

Lemma 2. If is Jeffreys prior (3.18), then (3.7) equals 

B B 

(3.21) ew)\p) =!>(&) eA(ft) /^ eA(ft) 

i=l i=l 
with A(/3) as in (3.11) and (3.12). 

The normal translation model f3 ~ M P (P,T,), with £ fixed, has A(/3) = 
0, so that the Bayes estimate t in (3.21) equals the unweighted bootstrap 
estimate t, 

B 

(3.22) t = E{t((3)\P} = Y,ti/B = t. 

i=l 

Usually though, t will not equal i, the difference relating to the variability 
of AOS) in (3.21). 

A simple but informative result concerns the relative Bayesian difference 
(RBD) of t(/3) defined to be 

(3.23) RBD(t) = (t-t)/sd(t), 

Lemma 3. Letting ri = iriRi, the relative Bayesian difference of t((3) is 

(3.24) RBD(t) = cor(t, r) • cv(r) 
and i/vr(/3) = ^ Jcff (/5), 

(3.25) RBD(i) = cof(i,r) -sd(A); 

here cor (t, r) is the empirical correlation between ti and ri for the B bootstrap 
replications, cv(r) the empirical coefficient of variation of the rj values, and 
sd(A) the empirical standard deviation of the Aj values. 

Proof. Equation (3.24) follows immediately from (3.7), 

(3.26) RBD(t) = = cor (t, r )— — , 

sd(f)Xa ^i/^ r 

If vr(/3) is the Jeffreys prior (3.18), then r(/3) = exp(A(/?)) (3.19) and the 
usual delta-method argument gives cv(r) = sd(A). □ 
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The student score example of Figure 2 (which is not in exponential family 
form) has, directly from definition (3.23), 

,3.27) HHD W = ™|=™»= -0.101. 

which is also obtained from (3.24) with cof(t, r) = —0.945 and cv(r) = 0.108. 
Notice that the cv(r) factor in (3.24), and likewise sd(A) in (3.25), apply 
to any function t(f3), only the cof(t,r) factor being particular. The multipa- 
rameter examples of Sections 3 and 4 have larger cv(r) but smaller cor(£,r), 
again yielding rather small values of RBD(t). All of the Jeffreys prior ex- 
amples in this paper show substantial agreement between the Bayes and 
unweighted bootstrap results. 

Asymptotically, the deviance difference A(/3) depends on the skewness of 
the exponential family. A normal translation family has zero skewness, with 
A(/3) = and R((3) = 1, so the unweighted parametric bootstrap distribu- 
tion is the same as the flat-prior Bayes posterior distribution. In a repeated 
sampling situation, skewness goes to zero as n -1 / 2 , making the Bayes and 
bootstrap distributions converge at this rate. We can provide a simple state- 
ment in one-parameter families: 

Theorem 1. In a one-parameter exponential family, A(/3) has the Tay- 
lor series approximation 

(3.28) A(/3) = ± 7 Z 3 [Z = y- 1 /2 (/3 _ / 3 )]; 

where V and 7 are the variance and skewness of f3 ~ /«(")■ ^ n lo,rge-sample 
situations, Z ~ M(0, 1) andj is 0{n~ 1 / 2 ) , making A(/3) of order O p {n~ 1 ^ 2 ). 

(The proof appears in the Appendix, along with the theorem's multipa- 
rameter version.) 

As a simple example, suppose 

(3.29) $ ~ ^Gamma n /n [/3 G (0, 00)] , 

so /3 is a scaled version of a standard Gamma variate having n degrees of 
freedom. In this case, 

(3.30) A(/3) = ^=Z 3 withZ = v ^(|-l), 

making A(/3) an increasing cubic function of /3. The cubic nature of (3.28) 
and (3.30) makes reweighting of the parametric bootstrap replications by 
exp(Aj) more extreme in the tails of the distribution than near (3. 

Stating things in terms of conditional expectations E{t(f3)\f3} as in (3.7) 
is convenient, but partially obscures the basic idea: that the distribution 
putting weight proportional to Wi = i^iRi on approximates the posterior 
distribution 7r(/3|/3). 
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As an example of more general Bayesian calculations, consider the "pos- 
terior predictive distribution," 

(3.3i) g(y) = f*(P\P)9fi(y)dp, 

where y is the original data set yielding /3; by sufficiency as in (2.3), it 
has density functions gp{y) = fp(j3)h(y\j3). For each we sample y** from 
Then the discrete distribution putting weight proportional to to, on 
y**, for i = 1, 2, . . . , B, approximates g(y). See Gelman, Meng and Stern 
(1996). 

4. The multivariate normal family. This section and the next illustrate 
Bayes/bootstrap relationships in two important exponential families: the 
multivariate normal and generalized linear models. A multivariate normal 
sample y comprises n independent d-dimensional normal vector observations 

(4.1) y : yi~ d A^(/i,S), i = l,2,...,n. 

This involves p = d- {d + 3)/2 unknown parameters, d for the mean vector 
fj, and d ■ (d + l)/2 for the covariance matrix E. We will use 7 to denote 
the vector of all p parameters; 7 is not the expectation vector (3 (3.2), but 
rather a one-to-one quadratic function 7 = m(/3) described in formula (3.5) 
of DiCiccio and Efron (1992). 

The results of Section 3 continue to hold under smooth one-to-one trans- 
formations 7 = m(/3). Let /y( / y) denote the density of the MLE 7 = m(/3), 
and likewise R^) = f~i(l) / f^{l) for the conversion factor, .0(71,72) for 
the deviance, A(7) = [0(7,7) — 0(7,7)]/2 for the deviance difference, and 
7r Jeff (7) for Jeffreys prior. Then Lemma 1 continues to apply in the trans- 
formed coordinates: 

(4.2) R(i) = |(7)e A(7) [|(7) = Hl)/Ul)\- 
(See the Appendix.) 

A parametric bootstrap sample 

(4.3) / 7 (-) — ► 71, 72,.. .,7s 

approximates the conditional expectation of a function t(j), starting from 
prior #(7), by 

B B 

(4.4) E{t(l)\l} = Yl foiRi I 

i=l i=i 

as in (2.14), and if #(7) is Jeffreys prior, 

B B 

(4.5) £{i(7)|7} = E^ Al /E eAl 

i=i i=i 
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as in (3.21). This can be particularly handy since A is tranformation invari- 
ant and can be evaluated in any convenient set of coordinates, while 7r (7) 
need not be calculated at all. 

The following theorem provides £(7) and -^(7) for a multivariate normal 
sample (4.1), working with 7 the p = d- (d + 3)/2 coordinates consisting of 
fj, and the elements of E on or above its main diagonal: 

Theorem 2. In (fj,, E) coordinates, 

(4.6) i(^) = (m/\n) {d+2)/2 

and 

( E _1 - XT 1 
A(/i, E) = ra<^ (/i - p)' (fi - /}) 



(4.7) 



ti^EE™" 1 -EE" 1 ) |E| 

+ - — 2 - +log U 



(Proof in the Appendix.) 

Here l/£(/x,E) turns out to be exactly proportional to |V r (7)| -1 ^ 2 , and 
either expression gives 7r Jcff (/z,E). Expression (4.7) equals the deviance dif- 
ference (3.11), no matter what the choice of coordinates. 

Theorem 2 makes it easy to carry out parametric bootstrapping: having 
calculated the usual MLE estimates (/t, E), each bootstrap data set y* is 
generated as in (4.1), 

(4.8) y*:y*~N d (jl,£), i = l,2,...,n, 

from which we calculate the bootstrap MLE estimate (/(*, E*), denoted sim- 
ply (fx, E) as before. To each of B such replicates 

(4.9) (/x, E)i, (//, E) 2 , ...,{n, E)j, ...,{n, E) B 
is attached the weight 

(4.10) Wi = 

using Theorem 2 (or more exactly W{j ^2f Wj); this distribution, supported 
on the B points (4.9), estimates the posterior distribution of (/x, E) given 
(/i,E). Expectations are then obtained as in (4.4), and similarly for more 
general posterior parameters such as percentiles and credible limits. 

Figure 3 applies this methodology to the student score data of Table 1, 
assuming the bivariate normal model (2.3). We take the parameter of interest 
6 to be the eigenratio 

(4.11) = t(/i,E) = Ai/(Ai + A 2 ), 

where Ai and A 2 are the ordered eigenvalues of E; 6 has MLE 9 = t(£i, E) = 
0.793. 
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.793 

1 1 1 1 

0.6 0.7 0.8 0.9 

eigenratio 

Fig. 3. Heavy curve is Bayes posterior density for the eigenratio (4-11), starting from 
Jeffreys prior for a bivariate normal model; solid triangles show 95% credible limits 
(0.650,0.908). Beaded curve is BCa confidence density based on weights (2.17) with 
Zo = —0.222, a = 0; BCa 95% interval (0.598,0.890), open triangles, is shifted far leftward. 
Light dashed curve is unweighted bootstrap density. 



B = 10,000 bootstrap replications were generated as in (4.9), and ij = 
£((//, £)j) calculated for each. Total computation time was about 30 sec- 
onds. The heavy curve shows the estimated posterior density of 6 given 
(ft, E), starting from Jeffreys prior. The 95% credible region, l\% probabil- 
ity excluded in each tail, was 

(4.12) Bayes: £ (0.650, 0.908). 
That is, 

B 

(4.13) e Al /^ eAl =0 - 025 

<i<0.650 1 

and similarly for the upper endpoint. 

In this case the BCa 95% confidence limits are shifted sharply leftward 
compared to (4.12), 

(4.14) BCa: 6 € (0.598, 0.890). 

The beaded curve in Figure 3 shows the full BCa confidence density, that is, 
the estimated density based on the BCa weights (2.17). For the eigenratio, 
zq = —0.222 and a = are the bias correction and acceleration constants. 
See the Appendix for a brief discussion of the zq and a calculations. 
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eigenratio bootstrap replications 

Fig. 4. Solid curve: BCa weights (2.17), with (z ,a) = (-0.222,0), plotted versus boot- 
strap eigenratio replications 8i. Dashed curve; regression of Jeffreys prior Bayes weights 
exp(Ai) on 6\. 

Figure 4 helps explain the difference between the Bayes and BCa results. 
The heavy curve shows the BCa weights (2.17) increasing sharply to the 
left as a function of 9i = i((/z, the bootstrap eigenratio values. In other 
words, smaller values of 9i are weighted more heavily, pulling the weighted 
percentile points of the BCa distribution downward. On the other hand, the 
Bayes weig hts n^Ri = exp(Aj) (represented in Figure 4 by their regression 
on 9i) are nearly flat, so that the Bayes posterior density is almost the same 
as the unweighted bootstrap density shown in Figure 3. 

The BCa limits are known to yield highly accurate coverage probabilities; 
see DiCiccio and Efron (1996). Moreover, in the eigenratio case, the MLE 
9 is strongly biased upward, suggesting a downward shift for the confidence 
limits. This brings up a familiar complaint against Jeffreys priors, extensively 
discussed in Ghosh (2011): that in multiparameter settings they can give 
inaccurate inferences for individual parameters of interest. 

This is likely to be the case for any general-purpose recipe for choosing 
objective prior distributions in several dimensions. For instance, repeating 
the eigenratio analysis with a standard inverse Wishart prior on £ (covari- 
ance matrix /, degrees of freedom 2) and a flat prior on /i gave essentially 
the same results as in Figure 3. Specific parameters of interest require specif- 
ically tailored priors, as with the Bernardo-Berger reference priors, again 
nicely reviewed by Ghosh (2011). 
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In fact, the BCa weights can be thought of as providing such tailoring: 
define the BCa prior (relative to the unweighted bootstrap distribution) to 
be 

(4.15) Tif c& = wf c& /Ri 

with wf Ca as in (2.17). This makes the posterior weights irf Ca "Ri appearing 
in expressions like (3.7) equal the BCa weights w;f Ca , and makes posterior 
credible limits based on the 7r BCa prior equal BCa limits. Formula (4.15) 
can be thought of as an automatic device for constructing Welch and Peers' 
(1963) "probability matching priors;" see Tibshirani (1989). 

Importance sampling methods such as (4.5) can suffer from excessive vari- 
ability due to occasional large values of the weights. The "internal accuracy" 
formula (6.2) will provide a warning of numerical problems. A variety of help- 
ful counter-tactics are available, beginning with a simple truncation of the 
largest weight. 

Variations in the parametric bootstrap sampling scheme can be employed. 
Instead of (3.6), for instance, we might obtain Pi, fa, ■ ■ ■ ,Pb from 

(4.16) A ~ d AT p (A^,M^)), 

where jlp and are the observed mean and covariance of /3's from a pre- 
liminary /g(-) sample. Here h(Ep) indicates an expansion of designed to 
broaden the range of the bootstrap distribution, hence reducing the impor- 
tance sampling weights. If a regression analysis of the preliminary sample 
showed the weights increasing in direction v in the j3 space, for example, then 
h(Yip) might expand in the v direction. Devices such as this become more 
necessary in higher-dimensional situations, where extreme variability of the 
conversion factor R(fii) may destabilize our importance sampling computa- 
tions. 

Replacing (3.6) with (4.16) changes the conversion factor R(j3) (3.8), but 
in an easily computable way. In fact, replacing (3.6) with ~ M p {jlp, Xg) 
makes the calculation of R(/3) easier in situations where there is no simple 
formula for the bootstrap density fa([3)- 

5. Generalized linear models. The Bayes/bootstrap conversion theory 
of Section 3 applies directly to generalized linear models (GLM). A GLM 
begins with a one-parameter exponential family 

(5-1) 9r,(y) = e^-^g (y), 

where ry = a,y = /3, and <j){r]) = ip(ct) in notation (3.1). An n x p structure 
matrix X and a p-dimensional parameter vector a then yield an n-vector 
r\ = Xa, with each entry rjj governing an independent observation yj, 

(5.2) yj ~ g Vj (•) for j = 1,2, . . . ,n. 



16 



B. EFRON 



All of this results in a p-parameter exponential family (3.1), with a the 
canonical parameter vector. Letting fj, be the expectation vector of y = 
(yi,-..,y n )', 

(5.3) » = E a {y}, 
the other entries of (3.1) are 

n 

(5.4) P = X'y, P = X'p and i(>(a) = ^ <K*ja), 

i=l 

where x,- is the jth row of X. The deviance difference A(/3) (3.11) has a 
simple form, 

71 

Q) , (/3 + ^)-2^[0(x j a)-</)(x i a)] 
i=i 

n 

r7)'(/x + A)-2^[^)-^%)] 

i=i 

[a the MLE of a, r) = Xd, and /i the expectation vector (5.3) corresponding 
to a = a] according to (3.12). 

As an extended example we now consider a microarray experiment dis- 
cussed in Efron (2010), Section 2.1: 102 men, 50 healthy controls and 52 
prostate cancer patients, having each had the activity of N = 6033 genes 
measured [Singh et al. (2002)]. A two-sample test comparing patients with 
controls has been performed for each gene, yielding a z-value z k , that is, 
a test statistic having a standard normal distribution under Hq^, the null 
hypothesis of no patient/control difference for gene k, 

(5.6) H ok :z k ~JV(0,1). 

The experimenters, of course, are interested in identifying nonnull genes. 

Figure 5 shows a histogram of the iV z- values. The standard normal curve 
is too high in the center and too low in the tails, suggesting that at least 
some of the genes are nonnull. The better-fitting curve "model 4" is a fit 
from the Poisson regression family discussed next. 

There are J = 49 bins for the histogram, each of width 0.2, with centers 
xj ranging from —4.4 to 5.2. Let yj be the number of z k values in the jth 
bin, 

(5.7) yj = #{z k ebmj}, j = 1,2,..., J = 49. 

We will assume that the yj's are independent Poisson observations, each 
having its own expectation jij, 

(5.8) y/^Poi^), j = 1,2,..., J, 



A(0) = (a 



(5.5) 



(r; 
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Fig. 5. Histogram of the N — 6033 z -values from the prostate cancer study, Singh et al. 
(2002). Standard normal curve (dashed) is too high at center and too low in the tails. 
"Model 4, " solid curve, is the fit from a fourth-degree polynomial Poisson regression. 

and then fit curves to the histogram using Poisson regression. Why this might 
be appropriate is discussed at length in Efron (2008, 2010), but here we will 
just take it as a helpful example of the Bayes/bootstrap GLM modeling 
theory. 

We consider Poisson regression models where the canonical parameters 
rjj = log(fij) are mth-degree polynomial functions of the bin centers Xj, eval- 
uated by glm(y~poly(x,m) , Poisson) in the language R. This is a GLM 
with the Poisson family, m = log/x,-, where X is a J x (m + 1) matrix having 
rows Xj = (l,Xj , Xj, . . . , xj 1 ) for j = 1, 2, . . . , J. For the Poisson distribution, 
4>{rj) = [i in (5.1). The deviance difference function (5.5) becomes 

(5.9) A(/3) = (r ? -77)'(^ + A)-2-l , (/i-/i) 

with 1 a vector of J ones. 

Let "Mm" indicate the Poisson polynomial regression model of degree m. 
M2, with log(/x,) quadratic in Xj, amounts to a normal location-scale model 
for the marginal density of the Zk's. Higher-order models are more flexible. 
M4, the quartic model, provided the heavy fitted curve in Figure 5. Table 2 
shows the Poisson deviance for the fitted models M2 through M8. A dramatic 
decrease occurs between M3 and M4, but only slow change occurs after that. 
The AIC criterion for model m, 



(5.10) 



AIC(m) = Deviance + 2 • (m + 1) 
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Table 2 

Deviance from Poisson polynomial regression models for counts ( 5. 7), prostate data; 
AIC criterion (5.10) is minimized for the quartic model M4- Boot % shows the 
proportion of each model selected in B — 4000 bootstrap replications of the AIC criterion, 
bootstrapping from M8. Bayes % are weighted Bayes posterior proportions, 
assuming Jeffreys prior. The St Error column is obtained from the 
bootstrap- after-bootstrap calculations of Section 6 



Model 


Deviance 


AIC 


Boot % 


Bayes % 


(St Error) 


M2 


138.6 


144.6 


0% 


0% 


(0%) 


M3 


137.1 


145.1 


0% 


0% 


(0%) 


M4 


65.3 


75.3 


32% 


36% 


(20%) 


M5 


64.3 


76.3 


10% 


12% 


(14%) 


M6 


63.8 


77.8 


5% 


5% 


(8%) 


M7 


63.8 


79.8 


1% 


2% 


(6%) 


M8 


59.6 


77.6 


51% 


45% 


(27%) 



is minimized at M4, though none of the subsequent models do much worse. 
The fit from M4 provided the "model 4" curve in Figure 5. 

Parametric bootstrap samples y* were generated from M4, as in (5.8), 

(5.11) yJ^PoiCAj) for j = 1,2,..., J 

with ftj the MLE values from M4. B = 4000 such samples were generated, 
and for each one the MLE a*, and also /3* (5.4), were obtained from the 
R call glm(y* ~poly(x,4) , Poisson). Using the simplified notation a = a* 
gives bootstrap vectors rj = Xa, /x = exp(?7) = (exp(r^)),/3 = X'fi, where X 
is the 49 x 5 matrix poly (x, 4), and finally A(/3) as in (5.9). [Notice that f3 
represents ft* here, not the "true value" (3 of (5.4).] 

The reweighted bootstrap distribution, with weights proportional to 

(5.12) Wi = e A ' on fa for i = 1, 2, . . . , B = 4000, 

estimates the posterior distribution of j3 given starting from Jeffreys 
prior. The posterior expectation of any parameter 9 = t{j3) is estimated by 

YsWiU/YsWi as in (3-21). 

We will focus attention on a false discovery rate (Fdr) parameter 6, 

(5.13) d(z) = Fdr(z) = [1 - &(z)]/[l - F(z)], 

where $ is the standard normal c.d.f. and F{z) is the c.d.f. of the Poisson 
regression model: in terms of the discretized situation (5.8), 

J 

(5.14) n«) = E^-/E^ 

Xj<Z 1 
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.154 Fdrhat(3)=.192 .241 



0.15 0.20 0.25 0.30 

Fdr(3)* 

Fig. 6. Posterior densities for 9 = Fdr(3) (5.13), prostate data, based on B = 4000 para- 
metric bootstrap replications (5.11) from the fourth-degree Poisson regression model M4- 
Solid curve Jeffreys Bayes posterior density, using (5.12); heavy dashed curve BCa confi- 
dence density (2.17). Both give 95% interval 6 £ (—0.154,-0.241). Light dashed curve is 
unweighted bootstrap density. Total computation time was about 30 seconds. 



(with a "half count" correction at z = Xj). Fdr(z) estimates the probability 
that a gene having its Zk exceeding the fixed value z is nonnull, as discussed, 
for example, in Efron (2008). 

Figure 6 concerns the choice z = 3. Using quartic model M4 to estimate 
the fij's in (5.14) yields point estimate 

(5.15) = Fdr(3) = 0.192. 

Fdr values near 0.2 are in the "interesting" range where the gene might be 
reported as nonnull, making it important to know the accuracy of (5.15). 

The B = 4000 bootstrap samples for M4 (5.11) yielded bootstrap repli- 
cations 61,62, ... ,6b- Their standard deviation is a bootstrap estimate of 
standard error for 6,se = 0.024, so a typical empirical Bayes analysis might 
report Fdr(3) = 0.0192 ± 0.024. A Jeffreys Bayes analysis gives the full pos- 
terior density of 6 shown by the solid curve in Figure 6, with 95% credible 
interval 

(5.16) M4: 6e (0.154,0.241). 

In this case the BCa density (2.17) [(z ,a) = (-0.047, -0.026)] is nearly the 
same as the Bayes estimate, both of them lying just slightly to the left of 
the unweighted bootstrap density. 
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0.15 0.20 0.25 0.30 

Fdr(3) bootreps 



Fig. 7. B — 4000 parametric bootstrap replications o/Fdr(3) from M8 (solid histogram) 
compared with those from M4 (line histogram). Closed triangles indicate 95% M8 credible 
limits (0.141,0.239); open triangles M4 limits (0.154,0.241). 

The choice of philosophy, Jeffreys Bayes or BCa frequentist, does not make 
much difference here, but the choice of model does. Repeating the analysis 
using M8 instead of M4 to generate the bootstrap samples (5.11) sharply 
decreased the estimate. Figure 7 compares the bootstrap histograms; the 
95% credible interval for Fdr(3) is now 

(5.17) M8: 9 e (0.141,0.239). 

AIC calculations were carried out for each of the 4000 M8 bootstrap 
samples. Of these, 32% selected M4 as the minimizer, compared with 51% 
for M8, as shown in the Boot % column of Table 2. Weighting each sample 
proportionally to exp(Aj) (5.12) narrowed the difference to 36% versus 45%, 
but still with a strong tendency toward M8. 

It might be feared that M8 is simply justifying itself. However, standard 
nonparametric bootstrapping (resampling the N values) gave slightly 
more extreme Boot percentages, 

(5.18) 30%(M4), 9%(M5), 4%(M6), 2%(M7), 54%(M8). 

The fact is that data-based model selection is quite unstable here, as the 
accuracy calculations of Section 6 will verify. 

6. Accuracy. Two aspects of our methodology's Bayesian estimation ac- 
curacy are considered in this section: internal accuracy, the bootstrap sam- 
pling error in estimates such as (3.7) (i.e., how many bootstrap replications 
B need we take?), and external accuracy, statistical sampling error, for in- 
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stance, how much would the results in Figure 3 change for a new sample of 
22 students? The i.i.d. (independent and identically distributed) nature of 
bootstrap sampling makes both questions easy to answer. 

Internal accuracy is particularly straightforward. The estimate (3.7) for 
E{t(/3)\$} can be expressed in terms of Sj = t^iRi and fj = KiRi as 

(6.1) E = s/f 

Let cov be the 2x2 empirical covariance matrix of the B vectors (sj,rj). 
Then standard delta-method calculations yield a familiar approximation for 
the bootstrap coefficient of variation of E, 

2 1 ( Css Csr C-rr 




(6.2) cV = - ^-2^ + 

B \ s z sr 

where c ss , c sr and c rr are the elements of cov. 

The Jeffreys Bayes estimate for eigenratio (4.11) was E = 0.799 (nearly 
the same as the MLE 0.793). Formula (6.2) gave cv = 0.002, indicating 
that E nearly equaled the exact Bayes estimate E{t{j5)\ j5}. B = 10,000 was 
definitely excessive. Posterior parameters other than expectations are han- 
dled by other well-known delta-method approximations. Note: Discontinu- 
ous parameters, such as the indicator of a parameter 6 being less than some 
value tend to have higher values of cv. 

As far as external accuracy is concerned, the parametric bootstrap can 
be employed to assess its own sampling error, a "bootstrap-after-bootstrap" 
technique in the terminology of Efron (1992). Suppose we have calculated 
some Bayes posterior estimate Q = Q(f3), for example, E or a credible limit, 
and wonder about its sampling standard error, that is, its frequentist vari- 
ability. As an answer, we sample K more times from f^(-), 

(6.3) fp{-) — ►7i,72,---)7X', 

where the 7 notation emphasizes that these replications are distinct from 
/3i,/?2, • • • ,(3b in (3.6), the original replications used to compute Q. Letting 
Qk = Q(lk), the usual bootstrap estimate of standard error for Q is 

K 1 1/2 

Y,(Qk-Q-) 2 /(K-i) 

_fc=l 

Q. = J2Qk/K. K = 200 is usually plenty for reasonable estimation of se(Q); 
see Table 6.2 of Efron and Tibshirani (1993). 

This recipe looks arduous since each Qfc requires B bootstrap replications 
for its evaluation. Happily, a simple reweighting scheme on the original B 
replications finesses all that computation. Define 

(6-5) W fci = / A (7fc)// ft C8). 



(6.4) se(Q) 
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Lemma 4. If Q is a posterior expectation E = ^ti^iRi/ ^7Tji?j, then 
the importance sampling estimate of Qk is 

B B 




i=l i=l 



for general quantities Q, reweighting (3i proportionately to iiiRiWi gives Qk- 
The proof of Lemma 4 follows immediately from 



which is the correct importance sampling factor for converting an /s(/3) 
sample into an fp^k) likelihood. Note: Formula (6.6) puts additional strain 
on our importance sampling methodology and should be checked for internal 
accuracy, as in (6.2). 

Formula (6.6) requires no new computations of i(/3) , vr(/3) or R((3), and in 
exponential families the factor Wki is easily calculated: 



where is the canonical vector in (3.1) corresponding to /?,,. This usu- 
ally makes the computation for the bootstrap-after-bootstrap standard error 
(6.4) much less than that needed originally for Q. [Formula (6.5) is invariant 
under smooth transformations of /3, and so Wki can be calculated directly 
in other coordinate systems as a ratio of densities.] 

A striking use of (6.4) appears in the last two columns of Table 2, Sec- 
tion 5. Let i4(/3«) be the indicator function of whether or not model 4 min- 
imized AIC for the ith bootstrap replication: E{t4({3)\(3} = 0.36 according 
to the Bayes % column. However, its bootstrap-after-bootstrap standard 
error estimate was se = 0.20, with similarly enormous standard errors for 
the other model selection probabilities. From a frequentist viewpoint, data- 
based model selection will be a highly uncertain enterprise here. 

Frequentist assessment of objective Bayes procedures has been advocated 
in the literature, for example, in Berger (2006) and German, Meng and Stern 
(1996), but seems to be followed most often in the breach. The methodology 
here can be useful for injecting a note of frequentist caution into Bayesian 
data analysis based on priors of convenience. 

If our original data set y consists of n i.i.d. vectors yi, as in Table 1, we can 
jackknife instead of bootstrapping the 7fe's. Now % is /3 recomputed from 
the data set y(j) having yi removed for k = 1, 2, . . . ,n. Formulas (6.5)-(6.8) 
still hold, yielding 



(6.7) 



RiW ki = f^(%)/fM 



(6.8) 



W ki = e 



(6.9) 
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An advantage of jackknife resampling is that the 7^ values lie closer to (3, 
making Wki closer to 1 and putting less strain on the importance sampling 
formula (6.6). 

7. Summary. The main points made by the theory and examples of the 
preceding sections follows: 

• The parametric bootstrap distribution is a favorable starting point for 
importance sampling computation of Bayes posterior distributions (as in 
Figure 2). 

• This computation is implemented by reweighting the bootstrap replica- 
tions rather than by drawing observations directly from the posterior dis- 
tribution as with MCMC [formulas (3.7), (3.8)]. 

• The necessary weights are easily computed in exponential families for any 
prior, but are particularly simple starting from Jeffreys invariant prior, in 
which case they depend only on the deviance difference A(/3) [(3.9)-(3.12), 
(3.21), (4.7), (5.5)]. 

• The deviance difference depends asymptotically on the skewness of the 
family, having a cubic normal form (3.29). 

• In our examples, Jeffreys prior yielded posterior distributions not much 
different than the unweighted bootstrap distribution. This may be un- 
satisfactory for single parameters of interest in multiparameter families 
(Figure 3). 

• Better uninformative priors, such as the Welch-Peers family or refer- 
ence priors, are closely related to the frequentist BCa reweighting formula 
[(2.17), Figures 2 and 6]. 

• Because of the i.i.d. nature of bootstrap resampling, simple formulas exist 
for the accuracy of posterior computations as a function of the number B 
of bootstrap replications. [Importance sampling methods can be unstable, 
so internal accuracy calculations, as suggested following (6.2), are urged.] 
Even with excessive choices of B, computation time was measured in 
seconds for our examples (6.2). 

• An efficient second-level bootstrap algorithm ( "bootstrap-after-bootstrap" ) 
provides estimates for the frequentist accuracy of Bayesian inferences 
[(6-3H6.6)]. 

• This can be important in assessing inferences based on formulaic priors, 
such as those of Jeffreys, rather than on genuine prior experience (last 
column, Table 2 of Section 5). 

APPENDIX 

Transformation of coordinates: Let J(/3) be the Jacobian of the trans- 
formation 7 = m(/3), that is, the absolute value of the determinant of the 
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Hessian matrix (dPi/d'yj). Then f^(j) = fp{$)J{$) gives 

(A ,) Ul) Jj^= m ^ 



in (4.2), and 
(A.2) 



R{i) ~m~W)W)- m m 



since A (7) = A(/3) by the transformation invariance of the deviance. 
For any prior density vr(/3) we have tt (7) = vr(/3) J(/3) and 

ff (7)%) = n((3)J(f3)R(f3)J0)/J((3) 

(A.3) 

= J(J3)ir(P)R(J3). 

J(j3) acts as a constant in (A.3), showing that (4.4) is identical to (3.7). This 
also applies to Jeffreys prior, 7r Jeff (7), which by design is transformation 
invariant, yielding (4.5). 

Proof of Theorem 1. In a one-parameter exponential family, (3.13) 
and (3.14) give 

(A.4) - il>{a) = $ da + V(da) 2 /2 + U(daf/6 

and 

(A.5) P-P = Vda + U(da) 2 /2, 

where da = a — a, V = V(a), and U = U (a). Expression (3.12) for A can be 
written as 

(A.6) A = (/3- j3)da + 2[j3da-(ip-ip)]. 

Applying (A.4) and (A.5) reduces (A.6) to 

A = hU(daf = h\V 1/2 (a-a)} 3 

(A.7) 



= i 7[ y-i/2 (/3 _^ )] 3 = i^3 

with 7 = U /V 3 / 2 the skewness, the last line following from Z = y~ 1 / 2 (/3 — 
/3) = V l / 2 (a — a) (A.5). Standard exponential family theory shows that Z — > 
A/"(0, 1) under repeated sampling, verifying the theorem [remembering that 

the asymptotics here are for f3 ~ /§(•)> with /3 fixed]. The skewness 7 is then 
0{n~ 1 / 2 ), making A of order O p (ra -1 / 2 ). The first missing term in the Taylor 
expansion (A.7) for A is <5Z 4 /12, 5 the kurtosis, and is of order O^ra^ 1 ). 



BAYESIAN INFERENCE AND THE PARAMETRIC BOOTSTRAP 25 

The multiparameter version of Theorem 1 begins by considering a one- 
parameter subfamily of (3.1) now indexed by a rather than f3, 

(A.8) fM0) = f &+av 0) = e (a+avY$-^a + av) fQ 0^ 

where v is some fixed vector in MP; a here is not connected with that in 
(2.17). The deviance difference within is 

(A.9) AM(a) = A(d + au) 

since deviance is entirely determined by the two densities involved. 
The exponential family terms (3.1) for family f a (■) are 

a W = a, ft v) = v'p, /3 M = v'p, 

(A.10) 

v v p 

V&=v'Vv and # W W/j, 

j=i k=\ i=i 

giving skewness = JJW/yw 3 / 2 . Applying the one-dimensional result 
gives 

(A.ll) A(a + av) = -7 W 2 ( " )3 with Z<") = t/ ^~^ . 

6 (v'Vv) 1 ' 2 

Since v can be any vector, (A.ll) describes the asymptotic form of A(-) in 
the neighborhood of a. □ 

Proof of Theorem 2. For a single observation y ~ A/"d(M, E), let /i 
and /2 represent its density under (/xi,Ei) and (112,^2), respectively. Then 

2 log ^44 = log i4 + (v - m)'Mv - «0 

(A.12) /2fa) 

- (y-Mi)%G/-Mi)- 

But if y~ jV d (/ii,Ei), 

£; /l {(y-M2) / S2 1 (y- M2 )} 

(A.13) 

= (M2 - Mi)' S 2 (M2 - Mi) +trEiS 2 1 

while E^Ky — //i) / Sj~ 1 (y — fix)} = d. Taking the f\ expectation of (A.12) 
gives the deviance 

-D((Mi, s iMM2,E 2 )) 

(A.14) 

= log |E 2 |/|Ei| + (M2 - fii)'^2 (A*2 " Mi) + tr E X E2 1 - d 
for sample size n = 1. The deviance difference for sample size n 

(A.15) A = -{D((n, S), (ft, E)) - D((fL, S), (/x, E))} 

is then seen to equal (4.7). 
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Table 3 

BCa constants zo and a for our three examples 





6 


Zo 


a 


Student correlation 


0.498 


-0.069 





Student eigenratio 


0.793 


-0.222 





Prostate data Fdr(3) 


0.192 


-0.047 


-0.026 



The density of (//, S) from a N p (/j>, S) sample of size n is proportional to 

(A.16) {|S|- 1/2 e" n ^"^' s_1(A ~^ /2 }{|S| (n_ci " 2)/2 e~ ntrS_lf:/2 /|S| (n ~ 1)/2 } 
yielding (4.6). □ 

The BCa weights: The BCa system of second-order accurate bootstrap 
confidence intervals was introduced in Efron (1987) (Section 2 giving an 
overview of the basic idea) and restated in weighting form (2.17) in Efron 
and Tibshirani (1998). The bias correction constant zq is obtained directly 
from the MLE and the bootstrap replication 61,62, ■ ■ ■ ,6 b according to 

(A.17) Zo = $-\#{8i<8}/B}. 

DiCiccio and Efron (1992) discuss "ABC" algorithms for computing a, the 
acceleration constant. The program abc2 is available in the supplement to 
this article. It is very fast and accurate, but requires individual programming 
for each exponential family. A more computer-intensive R program, accel, 
which works directly from the bootstrap replications (f3i,ti) [as in (3.6) and 
(3.7)], is also available in the supplement. 

Table 3 shows zq and a for our three main examples. Notice the especially 
large bias correction needed for the eigenratio. 
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